Speech signal distortion measurement which varies as a function of the distribution of measured distortion over time and frequency

ABSTRACT

Telecommunications testing apparatus includes an analyzer arranged to receive a distorted signal which corresponds to a test signal when distorted by telecommunications aparatus to be tested. The analyzer periodically derives, from the distorted signal, a plurality of spectral component signals responsive to the distortion in each of a plurality of spectral bands, over a succession of time intervals. The analyzer generates a measure of the subjective impact of the distortion due to the telecommunications apparatus, the measure of subjective impact being calculated to depend upon the spread of the distortion over time and/or over the spectral bands.

BACKGROUND OF THE INVENTION

1. Field of the Invention

This invention relates to a method and apparatus for testingtelecommunications apparatus.

2. Related Art

In testing telecommunications apparatus (for example, a telephone line,a telephone network, or communications apparatus such as a codec) a testsignal is introduced to the input of the telecommunications apparatus,and some test is applied to the resulting output of the apparatus. It isknown to derive "objective" test measurements, such as the signal tonoise ratio, which can be calculated by automatic processing apparatus.It is also known to apply "subjective" tests, in which a human listenerlistens to the output of the telecommunications apparatus, and gives anopinion as to the quality of the output.

Some elements of telecommunications systems are linear. Accordingly, itis possible to apply simple artificial test signals, such as discretefrequency sine waves, swept sine signals or chirp signals, random orpseudo random noise signals, or impulses. The output signal can then beanalyzed using, for example, Fast Fourier Transform (FFT) or some otherspectral analysis technique. One or more such simple test signals aresufficient to characterise the behaviour of a linear system.

On the other hand, modern telecommunications systems include anincreasing number of elements which are nonlinear and/or time variant.For example, modern low bit-rate digital speech codecs, forming part ofmobile telephone systems, have a nonlinear response and automatic gaincontrols (AGCs), voice activity detectors (VADs) and associated voiceswitches, and burst errors contribute time variations totelecommunications systems of which they form part.

Accordingly, it is increasingly less possible to use simple test methodsdeveloped for linear systems to derive objective measure of thedistortion or acceptability of telecommunications apparatus.

The low correlation between objective measures of system performance ordistortion and the subjective response of a human user of the systemmeans that such subjective testing remains the best way of testingtelecommunications apparatus. However, subjective testing by using humanlisteners is expensive, time-consuming, difficult to perform, andinconsistent.

Recently in the paper "Measuring the Quality of Audio Devices" by JohnG. Beerends and Jan A. Stemerdink, presented at the 90th AES Convention,1991 Feb. 19-22, Paris, printed in AES Preprints as Preprint 3070 (L-8)by the Audio Engineering Society, it has been proposed to measure thequality of a speech codec for digital mobile radio by using, as testsignals; a database of real recorded speech and analyzing thecorresponding output of the codec using a perceptual analysis methoddesigned to correspond in some aspects to the processes which arethought to occur in the human ear.

It has also been proposed (for example in "Objective Measurement Methodfor Estimating Speech Quality of Low Bit Rate Speech Coding", Irii,Kurashima, Kitawaki and Itoh, NTT Review, Vol 3. No. 5 September 1991)to use an artificial voice signal (i.e. a signal which is similar in aspectral sense to the human voice, but which does not convey anyintelligence) in conjunction with a conventional distortion analysismeasure such as the cepstral distance (CD) measure, to measure theperformance of telecommunications apparatus.

It would appear obvious, when testing apparatus such as a codec which isdesigned to encode human speech, and when employing an analysis methodbased on the human ear, to use real human speech samples as was proposedin the above paper by Beerends and Stemerdink. In fact, however, theperformance of such test systems is not particularly good.

BRIEF SUMMARY OF THE INVENTION

Our earlier International application PCT/GB93/01322 published on 6thJan. 1994 as WO94/00922 (now parent U.S. application Ser. No. 08/351,421filed Dec. 12, 1994) discloses a test system using an artificial speechtest signal and a perceptual model analysis method.

Accordingly, it is an object of the invention to provide an improvedtelecommunications testing apparatus and method. It is another object ofthe invention to provide a telecommunications testing apparatus whichcan provide a measure of the performance of telecommunications systemwhich matches the subjective human perception of the performance of thesystem.

In a paper "NMR and "Masking Flag": Evaluation of Quality usingPerceptual Criteria", Brandenburg and Sporer demonstrate the display ofvarious distortion effects as plots of distortion amplitude againstfrequency and against time, (using real speech and music samples as inthe Beerends paper). However, these results do not give a quantitativemeasure of the distortion. Moreover, the plot of amplitude against timegives no spectral information, and vice versa Beerends and Stemerdink,in a paper entitled "A Perceptual Audio Quality Measure Based on aPsychoacoustic Sound Representation", Journal of the Audio EngineeringSociety Audio/Acoustics/Applications Vol 40, No. 12, December 1992 pages963-978) discuss a quantitative measurement of distortion L_(N). Thismeasure is an integral over time and frequency (adjusted to a non-linear(pitch) scale) of a distortion value which is related to loudness of theerror signal, but modified according to time and pitch to introducemasking effects, threshold values and other perceptual factors.

The present invention provides, in one aspect, telecommunicationstesting apparatus comprising analysis means arranged to receive adistorted signal which corresponds to a test signal when distorted bytelecommunications apparatus to be tested, the analysis means comprisingmeans for deriving, from the distorted signal, a plurality of spectralcomponent signals responsive to the distortion in each of a plurality ofspectral bands, over a succession of time intervals, the analysis meansbeing arranged to generate a measure of the subjective impact of thedistortion due to the telecommunications apparatus, said measure ofsubjective impact being calculated to depend upon the distribution ofthe distortion over time and said spectral bands.

Viewed in another aspect, the invention provides a method of assessingthe distortion caused by telecommunications apparatus, in which thespectral and temporal distribution of the distortion is used to assessthe perceived impact of the distortion.

This invention provides a measure of the distortion which is related tothe distribution of the distortion over the time and spectral domains,or its concentration in a small area of these domains. Conveniently, theanalysis means (8) is arranged to derive a measure E_(E), referred toherein as "error entropy", of the distribution of said distortion overtime and said spectral bands, and is further arranged to derive ameasure E_(A) of the total amount of said distortion over apredetermined time segment, and to calculate a measure of Y_(LE) of saidsubjective impact based on said measures of distribution E_(E) and totaldistortion E_(A). This value Y_(LE) has a correlation with theperceptual importance of the distortion.

Preferably the measure of distribution E_(E) is determined as the sumover all time intervals (i) and spectral bands (j) of the value:-a(i,j)·ln (a(i,j)), where a(i,j) is the absolute magnitude of thedistortion in a predetermined time interval (i), and spectral band (j),expressed as a proportion of the total distortion over all timeintervals (i) and spectral bands (j).

If an error amplitude scale using logarithmic units is used, the valuesof a(i,j) are conveniently related exponentially to the scale units.

Other aspects and preferred embodiments of the invention will beapparent from the following description and claims.

BRIEF DESCRIPTION OF THE DRAWINGS

The invention will now be illustrated, by way of example only, withreference to the accompanying drawings in which:

FIG. 1 is a block diagram showing the arrangement of an embodiment ofthe invention in use;

FIG. 2 is a block diagram showing in greater detail the components of anembodiment of the invention;

FIG. 3 is a block diagram showing in greater detail a test signalgenerator forming part of the embodiment of FIG. 2;

FIG. 4 shows schematically the structure of a test signal over time;

FIG. 5a is a graph of the level of masked noise (dBs) against a pitch(e.g. approximately logarithmic frequency) axis in critical band rate(Bark) units, for different levels of masking noise; and

FIG. 5b is a diagram showing the variation of excitation threshold on apitch (approximately logarithmic frequency) axis in critical band rate(Bark) units, for masking noise at seven given frequencies;

FIG. 6 is a block diagram showing in greater detail an analysis unitforming part of the embodiment of FIG. 2;

FIGS. 7a and 7b form a flow diagram indicating schematically theoperation of the analysis unit in the embodiment of FIG. 6;

FIG. 8a shows schematically an estimate formed in this embodiment ofamplitude of excitation, as a function of time and pitch, which would beproduced in the human ear by a predetermined speech-like signal; and

FIG. 8b is a corresponding plot showing the excitation which would beproduced by two spaced clicks;

FIG. 9a is a diagram of distortion amplitude over pitch and time axesrepresenting a low magnitude nonlinear distortion of the speech signaldepicted in FIG. 8a;

FIG. 9b corresponds to FIG. 9a but with higher amplitude nonlineardistortion;

FIG. 9c corresponds to FIG. 9a but with the substitution of MNRUdistortion;

FIG. 9d corresponds to FIG. 9a but with the substitution of crossoverdistortion; and

FIG. 9e corresponds to FIG. 9a but with the substitution of clippingdistortion due to a voice activity detector;

FIG. 10a shows substantially a plot of distortion amplitude over timeand pitch axes for homogeneous distortion;

FIG. 10b is a table showing the amplitude values for the cells of theplot of FIG. 10a;

FIG. 11a is a plot corresponding to FIG. 10a for a first non-homogeneousdistortion; and

FIG. 11b is a corresponding table of amplitude values;

FIG. 12a is a plot corresponding to FIG. 10a for a secondnon-homogeneous distortion;

FIG. 12b is a corresponding table of amplitude values; and

FIG. 12c is a table of amplitude values corresponding to themultiplication of the distortion of FIG. 12a by a factor of 10;

FIG. 13 is a graph relating error magnitude to level of distortion ofone example of imposed MNRU distortion;

FIG. 14 is a graph relating error distribution to imposed distortion inthe same example;

FIG. 15 is a graph relating a subjective assessment of distortion byhuman listeners to imposed distortion in the same example; and

FIG. 16 shows part of the graph of FIG. 15, together with a predictedsubjective level of distortion derived according to the invention fromthe data of FIGS. 13 and 14.

DETAILED DESCRIPTION OF EXEMPLARY EMBODIMENTS Overview of Apparatus

Referring to FIG. 1, telecommunications apparatus 1 comprises an inputport 2 and an output port 3. Test apparatus 4 comprises an output port 5for coupling to the input port 2 of the telecommunications apparatusunder test, and an input port 6 for coupling to the output port 3 of thetelecommunications apparatus under test.

Referring to FIG. 2, the test apparatus 4 comprises a test signalgenerator 7 coupled to the output port 5, for supplying a speech-liketest signal thereto, and a signal analyzer unit 8 coupled to the inputport 6 for analyzing the signal received from the telecommunicationsapparatus 1. As will be discussed in greater detail below, the analyzer8 also utilises an analysis of the test signal generated by the testsignal generator 7, and this is indicated in this embodiment by a path 9running from the output port 5 to the input port 6.

Also provided from the analysis unit 8 is a measurement signal outputport 10 at which a signal indicating some measure of the acceptabilityof the telecommunications apparatus (for example, distortion) isprovided either for subsequent processing, or for display on a visualdisplay unit (VDU), not shown.

First Embodiment Speech Signal Generation

In its simplest form, the artificial speech generator may merelycomprise a digital store 71 (e.g. a hard disc or digital audio tape)containing stored digital data from which a speech signal can bereconstituted. The stored data may be individual digitised speechsamples, which are supplied in succession from the store 71 to a signalreconstituting means 72 (e.g. a digital to analog convertor (DAC))connected to the output port 5. The sample data stored in the store 71comprises one or more speech utterances lasting several seconds inlength (for example, on the order of ten seconds).

Alternatively, the store 71 may store speech data in the form of filtercoefficients to drive an LPC speech synthesizer, for example, or higherlevel data (e.g. phoneme, pitch and intensity data) to drive a phonemesynthesizer comprising the reconstituting means.

A control circuit 73 (e.g. a microprocessor) controls the operation ofthe store unit 71 to select a particular test signal to be output.

Referring to FIG. 4, the test signal data stored in the store 71 isreconstituted to form a test signal comprising a plurality of segmentst₀, t₁, t₂ . . . t_(n).

Each of the segments t₀ -t_(n) typically corresponds to a differentspeech sound (e.g. a different phoneme) or to silence. One knownartificial voice test signal is disclosed in CCITT Recommendation P50(Recommendation on Artificial Voices, Vol. Rec P50, Melbourne 1988,published by CCITT). In the P50 test signal, each segment lasts 60 ms.

The segments are grouped into patterns each comprising a randomlyselected sequence of 16 predetermined spectral patterns, defined by therecommendation, with spectrum densities S_(i) (f) equal to ##EQU1##

The transition between the different segments in each pattern isarranged to be smooth. Of the patterns, 13 correspond to voiced speechand the remaining 3 to unvoiced speech. A sequence of speech can eitherbe stored on a recording medium and reproduced, or can be generated fromstored data using a vocodec as described in the above referenced Iriipaper, for example.

The P50 signal has a long term and short term spectral similarity tospeech when averaged over about 10 seconds. Accordingly, preferably, thespeech sequence shown in FIG. 4 lasts at least this long. Certain typesof process, referred to below as "process with memory" exist in whichthe behaviour of a current speech element varies according to whichspeech element preceded it. These are not tested for adequately by thestandard P50 signal because speech elements are selected randomly inthat signal.

In an improvement over the standard P50 signal, the sequence ofpredetermined spectral patterns is not selected randomly, but insteadthe sequence is selected to represent sequences which occur in spokenlanguage. This ensures that processes with memory are approximatelyexercised by the test signal.

Distortion

The signal leaving the telecommunications apparatus 1 under test differsfrom the test signal supplied to the input port 2. Firstly, there willbe time-invariant linear distortions of the signal, resulting in overallchanges of amplitude, and in filtering of the signal so as to change itsspectral shape. Secondly, noise will be added to the signal from varioussources, including constant noise sources (such as thermal noise) anddiscontinuous sources (such as noise bursts, dialling pulses,interference spikes and crossed lines). Thirdly, there will be nonlinearand time-varying distortions of the signal due to nonlinear elementssuch as codecs and time-varying elements such as echo cancellers andthresholders.

The presence of nonlinear distortion can cause intermodulation betweennoise and the signal, and the distortion at the output port 3 thereforedepends not only upon the signal and the apparatus 1 but also the noise.Further, the presence of time-varying distortion, triggered by thesignal but acting after a delay, means that the distortion applied toany given temporal portion of the signal depends upon preceding temporalportions of the signal and noise; for instance, if high level noise ispresent before the beginning of a phoneme, a voice activity detector maynot clip the phoneme at all, whereas if the phoneme is preceded bysilence, the voice activity detector will heavily clip the beginning ofthe phoneme causing substantial distortion.

Analyzer 8

The analysis according to the present invention is intended to providean acceptability signal output which depends upon the distortion of thetest signal similarly to the response of a human ear, as it is presentlyunderstood.

Without dwelling upon the physical or biological mechanisms giving riseto these phenomena, it is well known that the human perception of soundis affected by several factors. Firstly the presence of one sound"masks" (i.e. suppresses the perception of) another sound in a similarspectral (frequency) region. The extent to which the other sound ismasked depends upon, firstly, how close in pitch it is to the firstsound and, secondly, to the amplitude of the first sound.

Thus, the human perception of errors or distortions in a sound dependsupon the sound itself; errors of low amplitude in the same spectralregion as the sound itself may be masked and correspondingly beinaudible (as, for example, occur with quantising errors in sub bandcoding).

Secondly, the masking phenomenon has some time dependence. A soundcontinues to mask other sounds for a short period after the sound isremoved; the amplitudes of the subsequent sounds which will be maskeddecay rapidly after the removal of the first sound. Thus, errors ordistortions will be masked not only by the present signal but also byportions of the signal which preceded it (to a lesser extent). This isreferred to as "forward masking". It is also found that the applicationof a high level sound just after a lower level sound which wouldotherwise have been audible retrospectively makes the earlier soundinaudible. This is referred to as "backward masking".

Thirdly, the human ear is not directly responsive to the frequency, butto the phenomenon perceived as "pitch" of a sound, which corresponds toa nonlinear warping of the frequency axis.

Fourthly, the human ear is not directly responsive to amplitude, evenwhen a signal is not masked, but to the phenomenon perceived as loudnesswhich is a nonlinear function of amplitude.

Accordingly, in this embodiment the analyzer 8 is arranged to processthe signal received from the telecommunications equipment 1 to determinehow significant or objectionable the distortion produced thereby in thetest signal will be to a human listener, in accordance with the aboveknown characteristics of the human ear.

More particularly, the analysis unit 8 is arranged to determine what theresponse of the human ear will be to the test signal generated by thetest signal generator 7; and then to similarly process the signal fromthe telecommunications apparatus output 3 to determine the extent towhich it perceptibly differs from the original test signal, bydetermining the extent to which distortions are perceivable.

FIG. 5a shows schematically the variation of the spectral maskingthreshold (the threshold above which a second sound is obscured by afirst) for narrow band noise at a fixed frequency. The five curves arefor progressively higher levels of masking noise, and it will be seenthat the effect of increasing the level of masking noise is to cause aroughly linear increase in the masking threshold at the masking noisefrequency, but also to change the shape of the threshold away from thenoise frequency (predominantly towards higher frequencies). The maskingeffect is therefore amplitude nonlinear with respect to the amplitude ofthe masking noise.

For a given masking noise level, the width (measured, for example, atthe 3 dB points below the central masking frequency) of the maskedspectral band varies with the frequency of the masking noise. Thisvariation of the width of the masked bands is related to thecharacteristic of the human auditory filter shape for frequencydiscrimination, and therefore to the human perception of pitch.

Accordingly, as shown in FIG. 5b, a scale of pitch, rather thanfrequency, can be generated from the frequency scale by warping thefrequency scale, so as to create a new scale in which the widths ofmasking bands are constant. FIG. 5b shows the critical band rate, orBark, scale which is derived by considering a set of narrow band maskingtones at different frequencies which cross at the -3 dB point. Thisscale is described, for example, in "Audio Engineering andPsychoacoustics: Matching Signals to the Final Receiver, the HumanAuditory System", J. Audio Eng. Soc. Vol. 39, March 1991, Zwicker andZwicker.

The critical bands shown in FIG. 5b are similar in shape (on thefrequency axis) below 500 hertz when represented on a linear frequencyscale. Above 500 hertz, they are similar in shape when viewed on alogarithmic frequency scale. Since the telephony band width is typically300 to 3150 hertz, and telecommunications apparatus is often bandlimited to between these limits, the transformation to the pitch scalein this embodiment ignores the linear region below 500 hertz with only asmall compromise in accuracy.

Referring to FIG. 6 the analysis unit 8 comprises an analog to digitalconverter (ADC) 81 arranged to receive signals from the input port 6 andproduce a corresponding digital pulse train; an arithmetic processor 82(for example, a microprocessor such as the Intel 80486 processor, or adigital signal processing device such as the Western Electric DSP 32C orthe Texas Instruments TMS C30 device), coupled to receive the digitaloutput of the ADC 81, a memory device 83 storing instruction sequencesfor the processor 82 and providing working memory for storing arithmeticresults, and an output line 84 from the processor 82 connected to theoutput 10.

Referring to FIGS. 7a and 7b, the processes performed by the processor82 in this embodiment will now be described.

Firstly, the test signal supplied from the test signal generator 7 isinput directly to the input port 6 in a step 100, without passingthrough telecommunications apparatus 1.

In the next step 101, the signal from the ADC 81 is filtered by a filterwhich corresponds to the transfer function between the outer portions ofthe ear and the inner ear. The filtering may typically be performed byexecuting a digital filtering operation in accordance with filter datastored in the memory 83. The filter may be characterised by a transferfunction of the type described in "Psychoacoustic models for evaluatingerrors in audio systems", J. R. Stuart, Procs. IOA, vol. 13, part 7,1991.

In fact, the transfer function to the inner ear will vary slightlydepending upon whether the sound is coupled closely to the ear (e.g.through a headset) or more distantly (e.g. from a loudspeaker);accordingly, the processor 82 and store 83 may be arranged to store thecharacteristics of several different transfer functions corresponding todifferent sound locations related to the type of telecommunicationsapparatus 1 on test, and to select an appropriate filter in response toa user input specifying the telecommunications apparatus type. Thefiltered signal after the execution of the step 101 corresponds to thesignal as it would be received at the inner ear.

Next, in a step 102, the signal is split into a plurality of spectralbands having bandwidths which vary logarithmically with frequency so asto effect the transformation from frequency to pitch. In thisembodiment, the signal is bandpass filtered into 20 bands each one-thirdof an octave in bandwidth, from 100 hertz to 8 kilohertz, according toInternational Standard ISO 532B; the ISO band filters are similar inshape when viewed on a logarithmic frequency axis and are well known anddocumented. The average signal amplitude in each of the 20 bands iscalculated each 4 milliseconds, and the signal after filtering thuscomprises a series of time segments each comprising 20 frequency bandamplitude values. This bandpass filtering is performed for all thevalues in the test signal (which lasts on the order of several seconds,for example, 10 seconds).

The relatively wide filters take account of the masking within eachfilter band, and the broad, overlapping skirts of the filters ensurethat spectral masking due to neighbouring frequencies is also takenaccount of.

Next, in step 103, frequency dependent auditory thresholds specified inInternational Standard ISO 226 are applied to each of the band outputs.This simulates the effect of the minimum audibility threshold indicatedin FIG. 5a.

Next, in step 104, the bandpass signal amplitudes are converted to aphone or sensation level which is more equivalent to the loudness withwhich they would be perceived by a human auditory system. The conversionis non-linear, and depends upon both signal amplitude and frequency.Accordingly, to effect the conversion, the equal loudness contoursspecified in international standard ISO 226 are applied to each of theband outputs. Both these equal loudness contours and the thresholds usedin step 103 are stored in the memory 83.

Next, in step 105, a temporal masking (specifically forward masking) isperformed by providing an exponential decay after a significantamplitude value. In fact, the rate of decay of the masking effectdepends upon the time of application of the masking sound; the decaytime is higher for a longer time of application than for a shorter time.However, in this embodiment, it is found sufficient to apply a fixedexponentially weighted decay, defined by y=56.5·10 (-0.01x), (where yrepresents level and x represents time) which falls between the maximumdecay (corresponding to over 200 milliseconds duration) and the minimumdecay (corresponding to 5 milliseconds duration) encountered inpractice.

In applying the forward masking, at each time segment for each bandpassfilter amplitude, masking values for the corresponding bandpass in thethree following time segments are calculated, using the aboveexponential decay. The three values are compared with the actualamplitudes of those bands, and if higher than the actual amplitudes, aresubstituted for the actual amplitudes.

As noted above, it is also possible for a sound to mask an earlieroccurring sound (so called "backward masking"). Preferably, in thisembodiment, the forward masking process is replicated to performbackward masking, using the same type of exponential decay, but withdifferent numerical constants (in other words, for each time segment,values of masking for earlier occurring time segments are calculated,and if higher than the actual amplitudes for those bands, aresubstituted for the actual amplitudes).

Thus, after step 105 the calculated signal data comprises a successionof time segment data each comprising 20 bandpass signal amplitudes,thresholded so that some amplitudes are zero, and the amplitude of agiven band in a given time segment being dependent upon the amplitudesof corresponding bands in past and future time segments due to theforward and backwards masking processing.

This corresponds to a surface indicating, along the signal pitch andtime axes, the masking effect which the test signal would have had uponthe human ear if directly applied without the telecommunicationsapparatus 1.

FIGS. 8a and 8b show excitation surfaces generated by the above process.FIG. 8a corresponds to a speech event comprising a voiced sound followedby an unvoiced sound; the formant structure of the first sound and thebroad band nature of the second sound can readily be distinguished. FIG.8b shows a corresponding surface for two clicks, and the effect of theforward masking stage 105 of FIG. 7a is clearly visible in theexponential decays therein.

Next, in step 106, the test signal generator 7 repeats the test signalbut this time it is supplied to the input port 2 of thetelecommunications apparatus 1, and the output port 3 thereof isconnected to the input port 6 of the test apparatus 4. The calculationstages 101-105 are then repeated, to calculate a corresponding surfacefor the received signal from the telecommunications apparatus 1.

Having calculated the effect on the ear (excitation) of the originaltest signal and of the output from the telecommunications apparatus (thedistorted test signal), the difference in the extent to which the twoexcite the ear corresponds to the level of distortion of the test signalas perceived by the human auditory system. Accordingly, the amplitudetransfer function of the telecommunications apparatus is calculated, foreach segment, by taking the ratio between the corresponding bandpassamplitudes (or where, as in FIG. 8a or 8b, the bandpass amplitudes arerepresented on a dB scale, by taking the difference between theamplitude in dBs). To avoid an overall gain term in the transferfunction, which is irrelevant to the perceived distortion produced bythe telecommunications apparatus, each bandpass term may be normalisedby dividing (or, when represented in dBs, subtracting) by the averageamplitude over all bandpass filter outputs over all time segments in thetest signal sequence, in step 107.

If the original test signal and the output of the telecommunicationsapparatus 1 are identical, but for some overall level difference (thatis to say, if the telecommunications apparatus 1 introduces nodistortion), the ratio between each bandpass filter output of the twosignals will be unity, and the logarithmic difference in dBs inamplitude will be zero; accordingly, the plot of the surfacerepresenting the distortion over time and pitch to would be completelyflat at all times and in all pitch bands. Any deviation is due todistortion in the telecommunications apparatus. Additive distortionerrors will-appear as peaks, and signal loss will appear as troughs,relative to the undistorted average level.

The sequence of sets of bandpass auditory excitation values(corresponding to a surface along the time and pitch axes) is dividedinto contiguous sectors of length 96 milliseconds (i.e. 48 successive 2millisecond segments) so as to include at least two different values forthe lowest pitch band. The total amount of error or error activity, iscalculated in step 109 as: ##EQU2## where c(i,j) is the error value inthe i^(th) time segment and j^(th) pitch band of the error surfacesector to be analyzed.

This gives an indication of the absolute amount of distortion present.

Then, the distribution of the error over time and pitch (or rather, theentropy of the distortion, which corresponds to the reciprocal of theextent to which the energy is distributed) is calculated in step 120 asfollows: ##EQU3##

The log term in the above expression controls the extent to which thedistribution of energy affects the entropy E_(E), acting as a non-linearcompression function.

It is found that the error activity and error entropy criteria togethercorrespond well to the subjectively perceived level of distortion, asthe listener will find a high level of error considerably morenoticeable if it is concentrated at a single pitch over a short periodof time, rather than being distributed over pitch and time.

The two measures are combined, together with appropriate weightings, andthe combined measure is thresholded in step 110. An output signal isgenerated (in step 111) to indicate whether or not the threshold hasbeen passed.

Referring to FIGS. 10a and 10b, where the error is uniformly distributedover time and pitch as shown in FIG. 10a, the total error activity E_(A)is 200 and the error entropy is E_(E) is at a relatively high level of4.605.

Referring to FIGS. 11a and 11b, the same amount of total error (erroractivity E_(A) =200) is distributed substantially into a broad peak. Theerror entropy E_(E) is correspondingly lower (E_(E) =3.294).

Referring now to FIGS. 12a and 12b, where the same amount of error iscontained in a single spike in a single time/pitch cell, the errorentropy is much lower (E_(E) =0.425).

FIG. 12c illustrates the effect which would be achieved by scaling theerror at every time/pitch cell by 10. The total amount of error (E_(A))has increased to 2000, but the error entropy (E_(E)) is still 0.425.

Thus, the error entropy E_(E) gives a measure of the distribution of theerror which is independent of the magnitude of the total amount oferror, whereas the error activity E_(A) gives a measure of the amount oferror which is independent of its distribution.

In fact, to take account of the logarithmic units of the audible erroramplitude scale employed in this embodiment, it is convenient to recastE_(A) and E_(E) as E'_(A) and E'_(E), as follows: ##EQU4##

The error activity and error entropy measures can then be combined togive a good indication of what the subjective listener response todistortion would be, in a manner which is relatively robust to theactual nature of the distortion.

For example, we have found that a good indication of the subjective"listening effort" measurement Y_(LE) is given by

    Y.sub.LE =-a.sub.1 +a.sub.2 log.sub.10 E'.sub.A +a.sub.3 E'.sub.E

where a₁ =8.373; a₂ =0.05388; and a₃ =0.4090.

In greater detail, therefore, the process performed by the analyzer 8 inthe combining step 110 comprises:

1. Calculating E'_(E) and E'_(A) for each time segment of the testsignal.

2. Summing the error activity and error entropy values over time to forman average value of the error activity E'_(A) and an average value ofthe error entropy E'_(E) over the whole duration of the test signal.

3. Using these values, forming a measure of the subjective impact ofdistortion, Y_(LE) =-a₁ +a₂ log₁₀ E'_(A) +a₃ E'_(E).

The averages formed in step 2 above may simply be arithmetic means, or(with appropriate scaling elsewhere in the combination process) sums.However, preferably, the averages are formed with different weightingsbeing given to the error activity and error entropy values fromdifferent time segments, depending upon their importance to a listener.For example, segments of the test signal which correspond to soundswhich occur frequently in natural speech may be given a higherweighting, since distortion of these sounds will be particularlynoticeable to the listener. Further, a higher weighting may be given totime segments which follow time segments containing silence, so that thenoticeable effects of clipping of the beginnings of words (whichconsiderably reduces intelligibility) due to the delayed onset of voiceswitches are given a high weighting. Further details are in ourInternational Patent Application PCT/GB94/01305 (published on 5th Jan.1995 as WO95/01011).

Further details of the derivation of the function used to combine theerror activity and error entropy value in the step 110 will now bediscussed.

The effects of modulated noise reference unit (MNRU) distortion wereadded to prerecorded files of human speech, used as test signals inplace of the signal generator 7, and average error activity and errorentropy values were derived using the analyser 8 following steps 101 to120 described above. The analysis was repeated for different levels ofdistortion, and the resulting error activity and error entropy valuesare plotted against the level of distortion in FIGS. 13 and 14respectively. It will be seen that the log error activity isapproximately negatively proportional to the level of distortion, andthat the error entropy is approximately proportional to the level ofdistortion.

Next, the same distorted speech was played to a panel of humanlisteners, who provided measurements of listening effort Y_(LE)according to internationally followed experimental practice, on a scaleof 1-5. The average of the human listeners scores for the varying levelsof distortion is shown in FIG. 15. The shape of the relationship of FIG.15 can be described by:

    (Y-1)/(Ymax-1)=1/(1+e.sup.4S(M-Q))

where Y is the opinion score, S=0.0551, M=11.449, Ymax=4.31, and Q isthe equivalent quantisation distortion in dB.

Next, the log error activity values and the error entropy values shownin FIGS. 14 and 15 were fitted, by linear regression, to the distortionlevels. The regression gave the relationship:

    Distortion Q=-55.09-0.5556 log.sub.10 E'.sub.A +2.624 E'.sub.E

Next, the relationship between the distortion and the opinion scoreY_(LE) subjectively determined by human listeners was used to convertthe relationship between distortion and error activity and entropy to aprediction of opinion scores (based on error activity and errorentropy). The relationship thus given is:

    Y.sub.LE =-8.373+0.05388 log.sub.10 E'.sub.A +0.4090 E'.sub.E.

In FIG. 16, the dotted trace shows the predicted subjective opinionscore calculated in this manner and the solid line indicates thesubjective listener scores (redrawn from FIG. 15). The agreement is seento be close.

To determine the robustness of the predicted subjective opinion scorethus calculated, the last calculation above was utilised in thecombining step of an analyser 8 according to the invention. The signalgenerator 7 in this test merely supplied prerecorded, known, humanspeech, and the telecommunications apparatus 1 was three commercial lowbitrate coder/decoders (codecs). The output of the codecs were alsoplayed to a bank of human listeners, who rated the quality of theoutput, as above, on a scale of 1-5.

The distortion introduced by each codec is complex, but includes somequantisation distortion and some time varying distortion due toadaptation of the codec in use.

The results are reproduced below:

    ______________________________________                                        Coding Algorithm                                                                             MOS (Experimental)                                                                          MOS (Prediction)                                 ______________________________________                                        Commercial low-rate codec A                                                                  3.39          2.90                                             Commercial low-rate codec B                                                                  3.16          2.67                                             Commercial low-rate codec C                                                                  2.65          2.94                                             ______________________________________                                    

It will be seen that, although the combination step 110 of the analyser8 was only determined in the context of MNRU distortion, and each of thecodecs employed a different type of distortion, the predicted humanopinion scores were within 0.5 opinion units (i.e. 10% of the range) foreach of the codecs.

Thus, it will be seen that this invention is capable of providing anindication of distortion of telecommunications apparatus which is closeto the subjective opinion of a human listener, and is relatively robustto different to different types of distortion.

Second Embodiment

In the second embodiment, the analysis unit 8 is the same or similar tothat in the first embodiment. However, the test signal generating unit 7does not utilise the P50 test signal, but instead generates a differenttype of artificial, speech-like test signal.

Whilst the P50 test signal is acceptable for many purposes, it isobserved to lack a full range of fricative sounds. Furthermore, it has arather regular and monotonous long term structure, which sounds ratherlike a vowel-consonant-vowel-consonant . . . sequence. As discussedabove, however, since many telecommunications systems include timedependent elements such as automatic gain controls or voice switches,the distortion applied to any given portion of the test signal is partlydependent upon the preceding portion of the test signal; in other words,the context of that portion of the speech signal within the timesequence of the signal as a whole.

Accordingly, in this embodiment, a small, representative, subset ofspeech segments (selected from the tens of known phonemes) is utilised,and a test signal is constructed from these sounds assembled indifferent contextual sequences. Since distortion is being measured, itis more important that the test sequence should include successions ofsounds which are relatively unlike one another or, more generally, arerelatively likely to cause distortion when one follows another. In asimpler form of this embodiment, the test signal might comprise each ofthe selected segments prefixed by a conditioning portion selected from ahigh, low or zero level, so that the test signal enables eachrepresentative speech segment (phoneme) to be tested following prefixedsounds of different levels. The length of the prefixing signal isselected to extend over the time constants of the system under test; forexample, codec adaptation and active gain control takes on the order ofa few seconds, whereas speech transducer transient response is on theorder of a few milliseconds.

Further details of this embodiment are to be found in theabove-mentioned International Patent Application No. PCT/GB94/01305(published as WO95/01011) the contents of which are incorporated hereinby reference in their entirety. The test signal of this embodiment couldalso be utilised with conventional analysis means.

Third Embodiment

In a third embodiment of the invention, the test signal generator 7operates in the same manner as in the first or second embodiments.However, the operation of the analysis unit 8 differs in step 102.

Although the logarithmically spaced filters of the first embodiment arefound to be a reasonable approximation to the pitch scale of the humanear, it is found that an even better performance is given by the use offilters which are evenly spaced on a Bark scale (as discussed above).Accordingly, in step 102, the twenty bandpass filters are roundedexponential (roex) filters spaced at one Bark intervals on the pitchscale. The round exponential function is described in "Suggestedformulae for calculating auditory-filter bandwidths and excitationpatterns", (J. Acoust. Soc. Am. 74, 750-753 1983), B. C. J. Moore and M.R Glasburg.

Rather than calculating the average signal amplitude in each band everyfour milliseconds, in this embodiment, the signal amplitude iscalculated over different averaging periods for the different bands,averaging over two milliseconds for the highest pitch band and 48milliseconds for the lowest pitch band, with intervening averaging timesfor the intervening bands. It is found that varying the temporalresolution in dependence upon the pitch (or, in general, the frequency)so as to resolve over a longer interval at lower frequencies gives asubstantially improved performance.

For subsequent processing, as before, for each two millisecond timesegment, an array of bandpass filter output values are generated. Forbands lower than the highest pitch, values are repeated more than oncefor intervening time segments (for example, for the lowest pitch band,each value is repeated 24 times for the two millisecond time segmentsbetween each 48 millisecond average amplitude value). It would, ofcourse, be possible to perform a numeric interpolation betweensucceeding values, rather than merely repeating them.

The steps 103-106 are the same as in the first embodiment (with theadjustment of numerical constants to reflect the different filterresponses).

Fourth Embodiment

In this embodiment, the analyser 8 is arranged to perform one furtherstep in the process of FIG. 7b, to calculate the extent to which thedistortion of the test signal is correlated to the original test signalgenerated by the test signal generator 7 over time.

The inclusion of an error-correlation parameter enables the analyser 8to take account of the (subjectively noticeable) effects which depend onthe degree of which any audible error is correlated with the inputsignal. Similarly it enables the analyser 8 to take account of the(subjectively noticeable) effects of temporally displaced versions ofthe test signal, known as echo and "pre-echo", (i.e. the early arrivalof a small portion of the test signal).

Noise-like errors which are highly correlated with the signal aresubjectively less noticeable than a noise-like error of similar energywhich is uncorrelated. This is because the listener's brain is busy whenlistening to the signal, so noise is less distracting than when thebrain is not preoccupied with interpreting the signal.

A separate set of correlation values is calculated for one or more ofthe frequency or pitch bands. Denoting the amplitude value of thedifference or transfer function surface calculated in this step 108 fora single frequency band as x_(t), and the corresponding element of theexcitation surface of the test signal calculated in step 106 as y_(t),and the length of the analysis segment as N (typically, the length of asegment of the test signal), the analyser 8 calculates a set of crosscorrelation coefficients R_(i), where i=0, 1, 2 . . . , by calculating:##EQU5##

The two significant parameters are the delay between the test signal andthe corresponding echo portion of the distorted signal, and theamplitude of the echo portion of the distorted signal. The amplitude ofthe echo portion is given by the largest value of cross correlationcoefficient (R_(i) max), and the delay is given by the value of i whichcorresponds to that maximum.

In this embodiment, each of these parameters is fitted (e.g. by linearregression) so that the predicted subjective opinion score Y_(LE) is afunction of the error activity, error distribution, error delay anderror temporal correlation.

Effects of the Invention

Referring to FIGS. 9a-9e, the representation of various types oftelecommunications apparatus' distortion of the test signal of FIG. 8aby the first and second embodiments of the invention will now beillustrated.

FIG. 9a shows the error excitation surface produced by instantaneousamplitude distortion produced by adding low amplitude second and thirdorder terms to the signal. The distortion was characterised as "barelyaudible" by a human listener. FIG. 9b shows the corresponding erroramplitude surface for fully audible nonlinear distortion of the sametype, but with higher value second and third order terms. The amplitudeof the error is much larger. Additionally, it will be seen that themajority of the distortion loudness coincides with the voiced part ofthe test signal of FIG. 8a, since this contains low frequency formanttones whose harmonics are perceptually significant.

Referring to FIG. 9c, the effects of modulated noise reference unit(MNRU) distortion are shown. MNRU distortion is described in Annex A ofCCITT Recommendation P81, and is designed to be theoretically equivalentto the distortion introduced by a single A Law PCM stage (of the kindwidely used in telecommunications systems). The level of distortion wascharacterised as fully audible by a human listener. Again, it will beseen from FIG. 9c that the perceptual distortion is associated chieflywith formants in the voiced part of the test signal.

Referring to FIG. 9d, when crossover distortion is supplied (i.e.distortion of the kind y=mx+c for x greater than zero and y=mx-c for xless than zero) low amplitude signals are not transmitted, and so thelower energy unvoiced sound in the second part of the test signal isdrastically attenuated. FIG. 9d therefore suggests a very significantsubjective impact of this kind of distortion, which corresponds with thereaction of the human listener.

Finally FIG. 9e illustrates the effects of a voice activity detectorwith a 50 millisecond onset time. In the initial part of the signal,there is a large (negative) error because the signal has been clipped.The following (positive) error is due to overshoot or settling.

Other Alternatives and Modifications

It will be clear from the foregoing that many variations to the abovedescribed embodiments can be made without altering the principle ofoperation of the invention. For example, if the telecommunicationsapparatus is arranged to receive a digital input, the DAC 71 may bedispensed with. The signal from the output port 5 could be supplied indigital form to the input port 2 of the telecommunications apparatus andthe ADC 81 may likewise be dispensed with. Alternatively, anelectromechanical transducer could be provided at the output port 5 andthe signal supplied as an audio signal. In the latter case the testsignal may be supplied via an artificial mouth as discussed in CCITTP.51 Recommendation on Artificial Ear and Artificial Mouth, Volume 5,Rec P.51, Melbourne 1988 and earlier UK patent application GB2218300(8730347), both incorporated herewith by reference. Similarly, thedistorted speech signal could be received via an artificial ear acousticstructure as described in the above CCITT Recommendation and our earlierUK patent application GB2218299 (8730346) incorporated herein byreference. This would reduce the filtering needed in the step 101.

As well as using the error activity and distribution measures todetermine the subjective impact of distortion, as discussed above, infurther embodiments the rate of change of these parameters over timeduring the test signal may be used, since rapidly changing distortionmay sound less acceptable to a listener.

Although in the above described embodiments, a single decay profile fortemporal masking is described, it may be preferred in alternativeembodiments of the invention to provide a plurality (for instance 2) ofdecay rates for forward (and backward) masking, and to select therequired decay rate in dependence upon the duration of the masking sound(i.e. the number of time segments over which the amplitude in one of thepassbands exceeds a predetermined level). For example, maximum andminimum decays (corresponding to 200 milliseconds and 5 millisecondsduration respectively, may be defined by;

    y=58.4039·10.sup.-0.0059x

    y=55.5955·10.sup.-0.0163x

Although connections to an actual telecommunications apparatus have beendescribed herein, it would equally be possible to programme a computingapparatus to simulate the distortions introduced by telecommunicationsapparatus, since many such distortions are relatively easy tocharacterise (for example, those due to VADs or codecs). Accordingly,the invention extends likewise to embodiments in which a signal issupplied to such simulation apparatus, and the simulated distortedoutput of the telecommunications apparatus is processed. In this way,the acceptability to a human listener of the combination of manycomplicated and nonlinear communications apparatus may be modelled priorto assembling or connecting such apparatus in the field.

Although the analysis unit 8 and test signal generator 7 have beendescribed as separate hardware, in practice they could be realised by asingle suitably processed digital processor; likewise, thetelecommunications apparatus simulator referred to in the aboveembodiment could be provided by the same processor.

Although in the above described embodiments the analyzer unit 8 receivesand analyses the test signal from the text signal generator 7, inpractice the analyzer unit 8 could store the excitation data previouslyderived for the, or each of several, test sequences by an earlieranalysis. Thus, the analyzer unit in such embodiments need not bearranged itself to analyze the undistorted test signal.

Although linear regression has been described as a method of finding thecombination process used in the combination step 110, it would equallybe possible to use a higher order regression, for example a logistic anddouble quadratic expansion as follows: ##EQU6##

Then the estimated value of opinion score Y', is given by:

    Y'=4/(1+e.sup.-w)

where

    w=ln (Y.sub.LE /(4-Y.sub.LE))

Finding the coefficients b_(i) is achieved by an iterative weightedleast squares calculation; many statistical calculation programmes areavailable for this purpose, including for example GLIM (™).

In this document, for convenience, the term "phoneme" is used toindicate a single, repeatable, human speech sound, notwithstanding thatin its normal usage a "phoneme" may denote a sound which is modified byits speech context.

Unless the reverse is indicated or apparent, the features of the aboveembodiments may be combined in manners other than those explicitlydetailed herein.

Although the embodiments described above relate to testingtelecommunications apparatus, the application of novel aspects of theinvention to other testing or analysis is not excluded.

I claim:
 1. Telecommunications testing apparatus arranged to receive adistorted signal which corresponds to a test signal when distorted bytelecommunications apparatus to be tested, the testing apparatuscomprising:means for periodically deriving, from the distorted signal, aplurality of spectral component signals responsive to the distortion ineach of a plurality of spectral bands, over a succession of timeintervals, means for generating a measure of the subjective impact ofthe distortion due to the telecommunications apparatus, said measure ofsubjective impact changing as a function of the distribution of thedistortion over time and over said spectral bands, and means forgenerating a measure of the total amount of said distortion over apredetermined time segment, and to provide a quantitative measure ofsaid subjective impact based on both said measure of distribution of thedistortion and said total amount of distortion.
 2. Apparatus as in claim1, in which a measure of distortion distribution E_(E) is related to thesum over all time intervals (i) and spectral bands (j) of the value:-a(i,j)·ln (a(i,j)), where a(i,j) is the absolute magnitude ofdistortion in a predetermined time interval (i), and spectral band (j),expressed as a proportion of the total distortion over all timeintervals (i) and spectral bands (j).
 3. Apparatus as in claim 1, inwhich the means for generating a measure of the subjective impact ofdistortion estimates the extent to which the distortion will beperceptible to a human listener.
 4. Apparatus as in claim 3 in which themeans for generating a measure of the subjective impact of distortionperforms spectral and temporal masking calculations.
 5. Apparatus as inclaim 1, in which the means for generating a measure of the subjectiveimpact of distortion performs a pitch analysis upon said distortedsignal, in which said spectral bands comprise pitch bands.
 6. Apparatusas in claim 1, further comprising a signal generator for supplying atest signal which has a spectral resemblance to human speech. 7.Apparatus as in claim 6, in which said test signal does not correspondto a single speaker conveying intelligent content speech.
 8. Apparatusas in claim 6 in which:the signal generator generates a test signalwhich comprises a sequence formed of a predetermined, small, number ofspeech segments, the speech signal comprising several different portionsincluding said segments such that said segments are represented inseveral different temporal contexts within said sequence, so as to varythe effects on each segment of time varying distortions in thetelecommunications apparatus.
 9. Apparatus as in claim 1, in which themeans for generating a measure of the subjective impact of distortionanalyzes the distorted signal, and forms, for each spectral band overeach time interval, a measure of the difference between the distortedsignal and the test signal.
 10. Telecommunications testing apparatusarranged to receive a distorted signal which corresponds to a testsignal when distorted by telecommunications apparatus to be tested, thetesting apparatus comprising:means for periodically deriving, from thedistorted signal, a plurality of spectral component signals responsiveto the distortion in each of a plurality of spectral bands, over asuccession of time intervals, means for generating a measure of thesubjective impact of the distortion due to the telecommunicationsapparatus, said measure of subjective impact changing as a function ofthe distribution of the distortion over time and over said spectralbands, and means for generating a measure of temporal correlationbetween distortion and the test signal, and to generate said subjectiveimpact measure in dependence upon said temporal correlation measure. 11.Apparatus as in claim 10 wherein said means for generating a measure oftemporal correlation utilizes a temporally displaced version of the testsignal.
 12. Telecommunications testing apparatus arranged to receive adistorted signal which corresponds to a test signal when distorted bytelecommunications apparatus to be tested, the testing apparatuscomprising:means for periodically deriving, from the distorted signal, aplurality of spectral component signals responsive to the distortion ineach of a plurality of spectral bands, over a succession of timeintervals, means for generating a measure of the subjective impact ofthe distortion due to the telecommunications apparatus, said measure ofsubjective impact changing as a function of the distribution of thedistortion over time and over said spectral bands, and said timeintervals being longer for lower frequency spectral component signalsthan for higher frequency spectral component signals.
 13. A method ofanalysing the output of a speech signal handling apparatus to derive ameasure of the audibility of distortion generated thereby, the methodcomprising:providing a predetermined test signal; analysing thedistorted signal corresponding to the test signal when distorted by theapparatus, using a digital electronic distortion calculation apparatus;and generating an indication of the subjective impact of said distortionbased on said analysis; said step of analysing including deriving adistribution measure of the spectral and temporal distribution of saiddistortion, deriving a total measure of the total amount of distortion,and said distribution measure and said total measure being both used toderive said subjective impact measurement.
 14. A method as in claim 13in which a measure of the extent to which the distortion will beperceptible to a human listener is derived.
 15. A method as in claim 14in which the analysis step comprises spectral and temporal maskingcalculations.
 16. A method as in claim 13 in which a pitch analysis isperformed on the distorted signal.
 17. A method as in claim 13 in whicha signal having a spectral resemblance to human speech is used as a testsignal.
 18. A method as in claim 17 in which the test signal does notcorrespond to a single speaker conveying intelligent content speech. 19.A method as in claim 17 in which:the test signal comprises a sequenceformed of a predetermined, small, number of speech segment, and thespeech signal comprises several different portions including saidsegments such that said segments are represented in several differenttemporal contexts within said sequence, so as to vary the effects oneach segment of time varying distortions in the telecommunicationsapparatus.
 20. A method as in claim 13 in which the measure ofdistortion is determined by analysing the distorted signal, and forming,for each spectral band over each time interval, a measure of thedifference between the distorted signal and the test signal.
 21. Amethod of analyzing the output of a speech signal handling apparatus toderive a measure of the audibility of distortion generated thereby themethod comprising:providing a predetermined test signal; analyzing thedistorted signal corresponding to the test signal when distorted by theapparatus using a digital electronic distortion calculation apparatus:generating an indication of the subjective impact of said distortionbased on said analysis; said step of analyzing including deriving ameasure of the spectral and temporal distribution of said distortion,and said distribution measure being used to derive said subjectiveimpact measurement; measure of distortion E_(E) being determined as thesum over all time intervals (i), and spectral bands (j), of the value:-a(i,j)·ln (a(i,j)), where a(i,j) is the absolute magnitude ofdistortion in a predetermined time interval (i), and spectral band (j),expressed as a proportion of the total distortion over all timeintervals (i) and spectral bands (j).
 22. A method of analyzing theoutput of a speech signal handling apparatus to derive a measure of theaudibility of distortion generated thereby, the methodcomprising:providing a predetermined test signal; analyzing thedistorted signal corresponding to the test signal when distorted by theapparatus, using a digital electronic distortion calculation apparatus;generating an indication of the subjective impact of said distortionbased on said analysis; said step of analyzing including deriving ameasure of the spectral and temporal distribution of said distortion andsaid distribution measure being used to derive said subjective impactmeasurement; and deriving a measure of the temporal correlation betweenthe distortion and the original signal.
 23. A method of analyzing theoutput of a speech signal handling apparatus to derive a measure of theaudibility of distortion generated thereby, the methodcomprising:providing a predetermined test signal; analyzing thedistorted signal corresponding to the test signal when distorted by theapparatus, using a digital electronic distortion calculation apparatus;generating an indication of the subjective impact of said distortionbased on said analysis; said step of analyzing including deriving ameasure of the spectral and temporal distribution of said distortion andsaid distribution measure being used to derive said subjective impactmeasurement; deriving a measure of the total amount of distortion over apredetermined period of time, and the input signal being assessed over aplurality of time intervals, said time intervals being longer for lowerfrequency spectral component signals than for higher frequency spectralcomponent signals.
 24. A method for quantitatively measuring distortionin speech handling circuits, said method comprising the stepsof:inputting a test signal into a speech handling circuit; generating anintermediate quantitative measure of respectively corresponding speechsignal distortion in an output of the speech handling circuit as amulti-dimensional function of both time and frequency; generating adistortion error entropy measurement based on said intermediatemulti-dimensional measure, said distortion error entropy measurementalso varying as a function of the distribution of measured distortionover time and frequency even when the total aggregate of measureddistortion over a time and frequency remains constant.
 25. A method asin claim 24 further comprising:generating an estimate of subjective meanopinion scores for the measured distortion by forming a weighted sum ofthe distortion error entropy measurement and the total aggregatemeasured distortion over time and frequency.
 26. A method as in claim24, further comprising the step of deriving a measure of the totalamount of distortion over a predetermined period of time.
 27. A methodas in claim 26 in which a measure of distribution E_(E) is determined asthe sum over all time intervals (i), and spectral bands (j), of thevalue: -a(i,j)·ln (a(i,j)), where a(i,j) is the absolute magnitude ofthe distortion in a predetermined time interval (i), andspectral band(j), expressed as a proportion of the total distortion over all timeintervals (i) and spectral bands (j).
 28. A method as in claim 24further comprising the step of deriving a measure of the temporalcorrelation between the distortion and the original signal.
 29. A methodas in claim 24 in which a measure of the extent to which the distortionwill be perceptible to a human listener is derived.
 30. A method as inclaim 29 in which the analysis step comprises spectral and temporalmasking calculations.
 31. A method as in claim 24 in which a pitchanalysis is performed on the distorted signal.
 32. A method as in claim24 in which a signal having a spectral resemblance to human speech isused as a test signal.
 33. A method as in claim 32 in which the testsignal does not correspond to a single speaker conveying intelligentcontent speech.
 34. A method as in claim 32 in which:the test signalcomprises a sequence formed of a predetermined, small, number of speechsegment, and the speech signal comprises several different portionsincluding said segments such that said segments are represented inseveral different temporal contexts within said sequence, so as to varythe effects on each segment of time varying distortions in thetelecommunications apparatus.
 35. A method as in claim 24 in which themeasure of distortion is determined by analyzing the distorted signal,and forming, for each spectral band over each time interval, a measureof the difference between the distorted signal and the test signal. 36.A method as in claim 24 in which the input signal is assessed over aplurality of time intervals, said time intervals being longer for lowerfrequency spectral component signals than for higher frequency spectralcomponent signals.
 37. Apparatus for quantitatively distortion measuringin speech handling circuits, said apparatus comprising:means forinputting a test signal into a speech handling circuit; means forgenerating an intermediate quantitative measure of respectivelycorresponding speech signal distortion in an output of the speechhandling circuit as a multi-dimensional function of both time andfrequency; means for generating a distortion error entropy based on saidintermediate multi-dimensional measure, said distortion error entropymeasurement also varying as a function of the distribution of measureddistortion over time and frequency even when the total aggregate ofmeasured distortion over a time and frequency remains constant. 38.Apparatus as in claim 37 further comprising:means for generating anestimate of subjective mean opinion scores for the measured distortionby forming a weighted sum of the distortion error entropy measurementand the total aggregate measured distortion over time and frequency. 39.Apparatus as in claim 37 further comprising means for generating ameasure of the total amount of said distortion over a predetermined timesegment, and to provide a quantitative measure of said subjective impactbased on both said measure of distribution of the distortion and saidtotal amount of distortion.
 40. Apparatus as in claim 39 in which ameasure of distortion distribution E_(E) is related to the sum over alltime intervals (i) and spectral bands (j) of the value: -a(i,j)·ln(a(i,j)), where a(i,j) is the absolute magnitude of distortion in apredetermined time interval (i), and spectral band (j), expressed as aproportion of the total distortion over all time intervals (i) andspectral bands (j).
 41. Apparatus as in claim 37 further comprisingmeans for generating a measure of temporal correlation betweendistortion and the test signal, and to generate said subjective impactmeasure in dependence upon said temporal correlation measure. 42.Apparatus as in claim 41 wherein said means for generating a measure oftemporal correlation utilizes a temporally displaced version of the testsignal.
 43. Apparatus as in claim 37, in which the means for generatinga measure of the subjective impact of distortion estimates the extent towhich the distortion will be perceptible to a human listener. 44.Apparatus as in claim 43 in which the means for generating a measure ofthe subjective impact of distortion performs spectral and temporalmasking calculations.
 45. Apparatus as in claim 37 in which the meansfor generating a measure of the subjective impact of distortion performsa pitch analysis upon said distorted signal, in which said spectralbands comprise pitch bands.
 46. Apparatus as in claim 37 furthercomprising a signal generator for supplying a test signal which has aspectral resemblance to human speech.
 47. Apparatus as in claim 46, inwhich said test signal does not correspond to a single speaker conveyingintelligent content speech.
 48. Apparatus as in claim 46 in which:thesignal generator generates a test signal which comprises a sequenceformed of a predetermined, small, number of speech segments, the speechsignal comprising several different portions including said segmentssuch that said segments are presented in several different temporalcontexts within said sequence, so as to vary the effects on each segmentof time varying distortions in the telecommunications apparatus. 49.Apparatus as in claim 37 in which the means for generating a measure ofthe subjective impact of distortion analyzes the distorted signal, andforms, for each spectral band over each time interval, a measure of thedifference between the distorted signal and the test signal. 50.Apparatus as in claim 37 in which the said time intervals are longer forlower frequency spectral component signals than for higher frequencyspectral component signals.